Geometric and Statistical Properties of the Mean-Field HP Model, the LS Model and 

Real Protein Sequences 
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Lattice models, for their coarse-grained nature, are best suited for the study of the "designability 
problem" , the phenomenon in which most of the about 16,000 proteins of known structure have their 
native conformations concentrated in a relatively small number of about 500 topological classes of 
conformations. Here it is shown that on a lattice the most highly designable simulated protein 
structures are those that have the largest number of surface-core switchbacks. A combination of 
physical, mathematical and biological reasons that causes the phenomenon is given. By comparing 
the most foldable model peptides with protein sequences in the Protein Data Bank, it is shown that 
whereas different models may yield similar designabilities, predicted foldable peptides will simulate 
natural proteins only when the model incorporates the correct physics and biology, in this case if the 
main folding force arises from the differing hydrophobicity of the residues, but does not originate, 
say, from the steric hindrance effect caused by the differing sizes of the residues. 



PACS number: 87.10. -fe, 87.15.-v, 87.15.By 
I. INTRODUCTION 

It is believed that the dynamical folding of a protein 
to its native conformation is determined by the amino 
acid sequence of the protein Yet the folding of any 
particular protein is an extremely complex process; sim- 
ulation of the folding of even a small protein remains 
an unsurmounted challenge to state-of-the-art computers 
1^. Nevertheless, a good understanding of a number of 
general features of protein folding have been acquired in 
computational studies using simple lattice models 
One feature is the so-called funnel picture that leads to 
a two-state description of folding [||J|] . Here the vertical 
dimension of the funnel represents the state of folded- 
ness of the protein (or roughly its free energy), which 
increases (decreases) from the top towards the bottom of 
the funnel, and a cross-section of the funnel represents 
the conformation space accessible to the folding protein 
at a given state of foldedness. Near the top of the funnel, 
most conformations are freely accessible and folding pro- 
ceeds extremely rapidly. As the folding progresses and 
the opening of the funnel narrows, accessibility of one 
conformation from another becomes increasing restric- 
tive, so that increasingly fewer pairs of conformations are 
connected by almost-equal-energy paths and folding cor- 
respondingly slows down. An alternative view is that the 
energy landscape becomes increasingly rugged. At some 
junction the rate of decrease in the number of accessi- 
ble conformations, hence the rate of decrease in entropy, 
is so large as to cause the rate of free-energy change as 
a function of foldedness to be positive, so that a free- 
energy barrier is formed to become an obstacle against 



further folding. At this point folding practically grinds 
to halt and can proceed stochastically only on very rare 
occasions that brings it over the barrier, after which the 
protein folds (and unfolds) relatively rapidly to its native 
conformation in an annealing-like process. 

Another issue clarified by simple lattice models is the 
designability of " topological" classes of protein conforma- 
tions P|7|,p^ . The designability of a conformation class 
is the number of proteins whose native conformations be- 
long to the class. At the moment the number of proteins 
with known three-dimensional conformations in the Pro- 
tein Databank (PDB ||ll|) is of the order of 16,000 and 
is increasing rapidly, while the number of conformation 
classes has remained about 500 for some time and is not 
expected to grow beyond 1000. Even when the the fact 
that many proteins in the PDB are homologues with sim- 
ilar structures are taken into account, the discrepancy 
between the number of non-homologous proteins and the 
number of conformation classes of observed native con- 
formations is glaring. Because a class is in fact composed 
of many conformations that differ in detail (such differ- 
ences could very well be important to the function of 
proteins), the problem of designability is best studied in 
coarse-grain models, such as lattice models, that disre- 
gard such details. 

The simplest interacting lattice model is the HP model 
proposed by Dill et al. , in which the 20 kinds of amino 
acids are divided into two types, hydrophobic (H) and 
polar (P). This model has been studied extensively by 
several groups in the last decade [^-^ . A mean-field ver- 
sion of the model that yields tremendous simplification 
was used to study the designability problem, and it was 
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found that the designabihties of structures vary greatly 
(the terms structures and conformation classes will be 
used interchangeably in this paper), and that only a tiny 
portion of structures are highly designable. Moreover, 
it was noted that highly designable structures seem to 
have patterns that emulates secondary structural motifs 

IB- 

In a general Hamiltonian setting, the Hamiltonian H, 
can be viewed as a mapping of the peptide space V to 
the conformation space C. When C is sufficiently coarse 
grained, which is the case we consider, each point in C 
is a topological class of native conformations. Then TL 
is a mapping of V to such conformation classes into C. 
If we remove from V all the peptides that are mapped 
by Ti to more than one conformation class in C (i.e., the 
degenerate cases), the remainder of V is partitioned by 
Ti, into equivalent classes of peptides, with each peptide 
class being mapped to a single conformation class. Des- 
ignability results from a highly skewed distribution of 
the size of the peptide classes. We shall call peptides 
belonging to peptide classes that are mapped to highly 
designable structures highly foldable peptides. 

In I?) the designability issue of the mean-field HP 
model was reduced to a purely geometric problem which 
rendered it easy to discuss and visualize the skewed dis- 
tribution of the size of peptide classes. It was however 
not made clear what characterizes those structures that 
are highly designable, nor was it demonstrated whether 
or not highly foldable peptides have anything to do with 
real proteins. In fact, whereas one can well imagine many 
Ti's in lattice models to yield biased designability, it is 
not clear that any such Ti would yield foldable peptides 
that simulate real proteins. 

In this paper, expanding on claims made in an earlier 
letter 0, the highly desi gnable structures in the mean- 
field HP model will be characterized - they are those that 
have the largest number of surface-core switchbacks, and 
it will be shown that highly foldable peptides have a high 
similarity with real protein sequences in general and with 
segments of sequences that fold to a helices in particular. 

To demonstrate a point made above, this paper also 
discusses a lattice model that exhibits designability but 
does not seem to be biologically correct. In the LS model, 
the 20 kinds of amino acids are divided into two types, 
large (L) and small (S), and it is assumed that the de- 
ciding factor in folding is the the steric hindrance effect 
caused by the difference in the sizes of the amino acids 
I p^ . It was shown in ref. that on a lattice, structures 
in the LS model too have uneven designability (there 
called encodability score); only a small portion of struc- 
tures, also claimed to have protein-like secondary struc- 
tures, are selected by large numbers of peptide sequences 
as unique ground states. It will be shown here that in 
spite of the fact that the LS model is mathematically al- 
most equivalent to the mean-field HP model, unlike the 
mean-field HP model, highly foldable peptides in the LS 



model do not match well with real protein sequences. 

In the following two sections the mean-field HP model 
and the LS model are reviewed and it is shown that, 
notwithstanding their quite different physical contents, 
on square lattices the two models are mathematically 
close approximates. In Section 4 the geometrical proper- 
ties of a two-dimensional square lattice and the way they 
restrict the space of structures, which are compact paths 
on the lattices, are discussed. In Section 5 it is shown 
that only a very small portion of the structure have the 
highest numbers of surface-core switchbacks and that, for 
both models, it is these structures that have the highest 
designabihties. Because the partition of amino acids in 
the HP model is based on hydrophobicity while that in 
the LS model is based on residue size, the highly foldable 
peptides are translated into different sets of "physical" 
peptides in the two models. In Section 6 the highly fold- 
able peptides in the two models are compared with real 
proteins in the Protein Data Bank and it is shown that 
the highly foldable peptides in the HP model match well 
with real protein sequences in general and with segments 
of sequences that fold to a helices in particular (but not 
well with segments of sequences that fold to (3 sheets), 
whereas those in the LS model match poorly with real 
protein sequences. Section 7 gives an expanded discus- 
sion of our results. In an Appendix the most highly fold- 
able peptides in the two models are given and compared. 

II. THE HP MODEL 

The Hamiltonian of the HP model is: 

H = Y.E^^J,^^{n-r]) (1) 

where pi is the type, H for hydrophobic and P for polar, 
of the ith residue, or amino acid, in the peptide chain 
A(r^ — rj) = 1 if ?^ and fj are nearest neighbors in 
the lattice but not adjacent along the peptide sequence, 
and A(7^ ~ ^j) — otherwise; Ep^p. specifies the residue 
contact energies that depend on the types of residues in 
contact. 

Several sets of contact energies {Ehh,Ehp,Epp) have 
been used: (—1,0,0) for the original HP model 
(-2.3,-1,0) by Li et al. §, and (-7r,-l,0) by Buch- 
ler and Goldstein ||l^. Li et al. suggested that the con- 
tact energies should satisfy the following constraints: 1) 
compact shapes have lower energies than non-compact 
shapes; 2) Epp > Ehp > Ehh so that hydrophobic 
residues are buried as much as possible; and 3) different 
types of residues tend to segregate, which is a condition 
induced by having 2Ehp > Epp + Ehh t^i^ 
work these will be adopted with the modification that 3) 
is replaced by the additive relation 2Ehp = Epp + Ehh- 
Then the potential simplifies to: 
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III. THE LS MODEL 



where Pi = I for H and Pi = for P residue [|5| . Hence- 
forth only structures that correspond to self-avoiding 
compact paths on a lattice will be considered. 

In an iVxA^ two-dimensional square lattices, there are 
four corner sites with coordination number Nn ~ 2, 
4(iV - 2) side sites with iV„ = 3 and {N - 2)^ core sites 
with Nn = 4. With the exception of the two ends of the 
peptide chain, which we ignore, each lattice point has 
Nn — 2 contacts. So the Hamiltonian Eq.(|l|) becomes: 

H^-iOx J2 +1x^+2x^)k 

i^corner inside i^core 

i i^core incomer 

The first term on the right-hand side of Eq. is a con- 
stant for a given peptide sequence. It is independent of 
whatever conformation the peptide resides in and, since 
Eq.(||) will only be used here to determine the native 
structure of a particular peptide sequence, it will be omit- 
ted. The third term means that it is costly to put H 
residues in the corner sites. Since it is of order l/N"^ it 
too will be omitted. The Hamiltonian then simplifies to 
what is known as the mean- field HP model 

F(p,s) = -p.S=i(|s-p|2-p2-s2) (4) 

where p = (pi,p2, . . . n = N"^, is the binary peptide 
sequence and s = (si, S2, ■ ■ ■ , s„) is a binary structural 
sequence converted from a self-avoiding compact path on 
the lattice with the assignment: Si = 1 (0) if the «th 
site of the structure is a core (surface) site. In this new 
form the Hamiltonian has an interpretation quite differ- 
ent from its original meaning. There it was an expres- 
sion of inter-residual interaction. Here in Eq. it is 
no longer inter-residual, rather it has the form of a site- 
dependent potential. With fixed for a given lattice 
and p^ a constant for a given peptide sequence, both are 
irrelevant to the determination of the ground state struc- 
ture of the peptide. They will be ignored in the ensuing 
calculation. The Hamiltonian now reduces to one-half of 
|s — pp and a neat geometric interpretation for it emerges 
0. When p and s are viewed as n-component vectors, 
this quantity is just the Hamming distance between two 
corner points in a unit n-dimensional hypercube. 

When the energy matrix elements are not additive, 
that is, when Ehh = —2 — 7 with 7 > as was used 
in the model cannot be reduced to the simple 

site-dependent form of Eq. (H) . The effect of 7 is to stabi- 
lize the low-lying states in the mean-field model further 
by increasing the number of H-H contacts. 



It was shown by Micheletti et al. that in the LS model 
the designability (called encodability score by the au- 
thors) distribution of structures is similar to that in the 
mean- field HP model . The Hamiltonian of this model 
is 

i/ = -^z,(r)-A(z(a,)-2»(r)) (5) 

i 

where cr^ € {L, S}] z{ai) is the maximal number of near- 
est contacts without steric repulsion belonging to residue 
i] on a square lattice, z{ai) is equal to 1 (2) for L (S) 
residues inside the chain, and to 2 (3) for L {S) residues 
at chain ends; Zi{T) is the number of contacts of the 
ith residue in a conformation P; and A(x) equals to 1 if 
a; > and —a < otherwise. The Hamiltonian implies 
that if the number of contacts of the ith residue is larger 
than z((Ti), then the contact energy will be increased by 
a owing to steric effects. 




FIG. 1. (a) The most (third most) designable, (b) tlie 
second most (most) designable and (c) the third (sec- 
ond) most designable structures in the mean-field HP (LS) 
model, respectively, on a 6x6 lattice. 



The results in Ref. where a was set equal to 00, 
show that the distribution of designability of structures 
in LS model is very similar to that in the HP model. In 
fact most of the highly designable structures in one model 
are likewise in the other model (see Appendix). The 
highly designable structures in the LS model also have 
protein-like secondary substructure and tertiary symme- 
tries. Three among the most designable structures in the 
two models are shown in Fig. |^. 

Just as practiced in the last section, we consider only 
compact structures and neglect the effect of the two end 
points on a peptide chain. Table || gives the values of x, 
A{x) and Hamiltonian for the two types of residues at 
corner, side and core sites on a square lattice. Let o, s 
and c denote the number of corner, side and core sites, 
respectively; n = o + s + c = iV^ the total number of 
sites; and the subscripts L and S denote residue type, 
then 

H = -Sl + 2acL -83- 2cs 

— 2anL — (1 + 2a)sL - 2aoL — ns - Cs + 03 (6) 

For a given peptide sequence, til and ns are fixed. 
First consider the case when the steric repulsion is strong 
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but finite, namely, a ^ 1. Dropping the comer term os 
one gets for a given peptide sequence, 

H = —(2a + l)c5 + const, w — 2ap • s + const. (7) 

where p and s are the peptide and structure binary vec- 
tors defined before, with the exception that in p the digit 
(1) now stands for L (S). Comparison of this equation 
with Eq. (Q) reveals that, at least on a square lattice, 
the mathematical form of the two models are essentially 
identical, provided that here the pair H and P in the HP 
model is replaced by S and L, respectively. Since there 
is only one scale in either model, the size of a does not 
matter so far as it is much greater than unity but finite. 



TABLE I. Action of the Hamiltonian for the LS 
model on a square lattice; end points of chains are 
ignored and x = z{a) — z{T). 



type 




corner 


side 


core 
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Aix) 
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1 




H 





-1 


-2 
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1 





-1 


L 


A{x) 


1 


1 


-a 




H 





-1 


2a 



When a — > oo, as was the case in Q, the term 2acL 
in the first line of Eq. ^ becomes a constraint that L 
residues are prohibited from core sites, namely cl = 
strictly, and the rest of the Hamiltonian becomes 

H = -cs + ol ~ riL + OS - ns ~ ■ s + const. (8) 

which again coincides with Eq. (^. 

IV. GEOMETRICAL PROPERTIES OF THE 2D 
SQUARE LATTICE 

Since Eqs. (||), (0) and (||) reduce the Hamiltonians of 
the mean-field HP and LS models to the same problem in 
geometry, namely one of the Hamming distance between 
the two vectors s and p, we now study the space of these 
vectors (in the HP model). Consider a.n NxN square 
lattice with n — N"^ sites. Recall that every structure 
is a self-avoiding compact path on the lattice. The set 
V of all binary peptides p is then just the set of 2" bi- 
nary sequences. Because of geometric constraints, the set 
ScV binary structure sequences s is far smaller than 
V . For a very rough estimate for the upper limit of the 
size of S, consider the construction of compact paths by 
random walk on the lattice. At any given point during 
the walk after the first step, the maximum number of al- 
lowed next steps is the coordination number minus one, 
which is between 2 and 3. As the number of steps taken 
increases, the average number of allowed next steps will 
decrease. We take the average number to be 2 up to the 



point when the lattice is half full. For a randomly chosen 
path, after the lattice is half full, chances are that the 
number of allowed next steps will be either one or zero 
most of the time. So the number of allowed s' should be 
much less than 2"/^ . On a 6 x 6 lattice this last number is 
262144, whereas the size of S is 30408, and the size of V 
is 2^^^ — 68, 719, 476, 736. An example of an allowed s on 
the 6x6 lattice is shown in Fig. ^ (a). If we think of V as 
the set of all the corner points in the n-dimensional unit 
hypercube, then the set S is composed of a tiny subset of 
corner points. It was shown earlier that the designabil- 
ity of an s G 5 is the Voronoi polytope of s in T'; it is 
clear what characterizes the designability problem is the 
distribution of the contents of S in the unit hypercube. 

We now examine how geometric constraints reduce V 
down to S. A sequence in V may be viewed as a chain 
of O's and I's connected by n — 1 links of three types, 
those connecting and sites, and 1 or 1 and sites, 
and 1 and 1 sites, respectively. Let the numbers of such 
links be noo, nio and rin, respectively. The sequence 
is partitioned by the 1-0 links into nio + 1 segments of 
contiguous I's or O's. Whereas the link numbers for a 
p are devoid of geometric meaning, that for s are the 
consequences of geometric constraints. To illustrate this, 
consider the case > 4 (the surface to core ratio in 
smaller lattices are too lop-sided to be of interest). Some 
of the simplest constraints that must be satisfied by an 
allowed s are: 

1. An isolated single may only occur at an end of a 
path; 

2. An isolated single 1 may only either occur at or be 
one 0-segment away from an end of a path; 

3. Each of the four corners on the lattice belongs to 
a 0-segment at least 4 sites long, except when the 
corner is an end of a path; 

4. For a path having the pattern s = (1 ■ • • 1) (both the 
ends of the path are 1-sites), 2rioo + »t.io = 8A^ — 8 
and 2 < nio < 4iV - 12; 

5. For s = (0010011 • • • 1), 2noo -I- nio = 8iV - 9 and 
5 < nio < 4iV - 11; 

6. For s = (0010011 ■ • • 1100100), 2noo-hnio = 8^-10 
and 10 < nio < 4iV - 10 if iV > 6, the last relation 
is replaced by 8 < riio < 4A^ - 10 if iV < 6; 

7. For s = (0010011---0) ^ (0010011 ••• 1100100), 
2noo -I- nio = 8A^ - 10 and ^<nw<AN~ 12; 

8. For s = (0---0) / (0010011---0) and ^ 
(0010011 • • • 1100100), 2noo + ^lo = 8A^ - 10 and 
2 < riio < 4iV - 12; 

9. For s = (0 • • • 1) ^ (0010011 • • ■ 1), 2noo + nw = 
8A^ - 9 and 1 < nio < 4iV - 13. 

The first two rules are obvious on a square lattice. The 
third rule implies that the polar residues tend to accumu- 
late around corners. This fortuitously reflects a property 
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of real proteins: the relative abundance of polar residues 
on surface areas with large curvatures. Figs. || (b) and 
(c) illustrate the origin of the fourth rule on a 6x6 lat- 
tice. The two structures are both of the type (1 ■ • • 1), 
that is, they begin and end both on core sites. The 
dark solid links in the figures define "templates" for con- 
structing s' that respectively have the maximum (twelve) 
and minimum (two) values for nio. Rules (5)-(8) can be 
shown in a similar way. By explicitly applying the above 
rules in the selection of s (as opposed to requiring an 
s to be a compact self- avoiding path), the total num- 
ber of 2^^^ — 68,719,476,736 binary sequences in V is 
reduced to a set of 537549 candidate paths which, rel- 
atively speaking, is now only slightly greater than the 
exact number (30408) of s' in S. This implies that the 
set of rules given above embodies the essence of the geo- 
metric requirement that guarantees elements in S to be 
compact self-avoiding paths. 





•0--0'0--0< 

•0 o-o-o- 

•O 0--Q 0' 

•a-6 6-6 < 

* * * * 



(a) (b) {c} 

FIG. 2. (a) A structure defined by a compact, 
self-avoiding path, which is in turn represented by the 
binary sequence (001100 110000 110000 110011 000011 
111100). Black (white) discs represent surface (core) sites 
coded by the digit (1). In (b) and (c), the dark, solid 
links define "templates" for constructing structures of the 
type (1 ■ ■ ■ 1) whose nio values are 12 and 2, respectively. 

V. DISTRIBUTION OF THE ALLOWED 
STRUCTURES IN THE HYPERCUBE 

Here we show that only a small portion of the struc- 
tures in S have large tiiq. On an A^xiV square lattice, 
there is a total of 2N^ — 2N links and — 1 among 
them need to be chosen to form a structure. For the 
6x6 case these numbers are 60 and 35, respectively. For 
the structure shown in Fig. ^ (b), of the total number 
of 60 links on the lattice, 28 links are used to define the 
template (that has nio=12) and 17 links, marked by filled 
diamonds in the figure, are forbidden because they would 
form close loops or connect sites which already have two 
links. This means that to complete an s from the tem- 
plate, one needs to select 35 — 28 = 7 links from among 
60 — 28 — 17 = 15 links on the lattice. Hence at most 
(\^) = 6435 s' with nio = 12 can be constructed from the 
template. A similar argument shows that — 817190 
s' with rtio = 2 can be constructed from the template 
shown in Fig. ^ (c), which has 21 predetermined links. 
The ratio 817190 : 6435 illustrates the point that the 
number of s' with high nio values is much smaller than 



the number of s' with low nio values. 

We now give a heuristic argument showing that there 
is an approximate relation between the smallest possible 
Hamming distance rfmi„(si,S2) between two structures 
Si and S2 and the difference in the nio values of the two 
structures, Anio=nio(si) — nio(s2); for simplicity we as- 
sume that nio(si) > nio(s2). For this discussion we ig- 
nore the two end points of the structures, so that (on a 
square lattice) all the segments on an s partitioned by 
0-1 links have at least two or two 1 digits. We begin 
by considering the case when S2=si. Then both d(si, S2) 
and Anio are zero. Suppose we can generate S2 by swap- 
ping the positions of a pair of O's and a pair of I's in Si 
(while keeping in mind that in most cases such an opera- 
tion would not give an s; it would give a p that is not in 
S). Then c?(si, S2) = 2 and, depending on the position of 
the replaced pair of O's in Si, Anio = or 2. Any other 
pair of S2 and Si having Anio = 2 will have d(si, S2) > 2. 
Thus dmini^i, S2) is 2 for Anio = 2. Similarly, if we gen- 
erate S2 by exchanging the positions of a pair of O's and 
a pair I's in Si, for example: 



(•• •0111111110- • -1000000001 •••) 
(---0111111000 ---1001100001---) 



(9) 



or 



(-- -0111111110- - -1000000001---) 

(• - - 0111100110 - - - 1001100001 - - -) (10) 



then d(si,S2) = 4 and Anio 2 (Eq.(|g|)) or 4 (Eq.(|10|)). 
Again any other S2 and Si having Anio = 2 or 4 
will have (i(si,S2) > 4. Thus dmm (§1,82) is 4 for 
Anio = 4. Arguing along this line it can be shown that 
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Hamming Distance 
FIG. 3. The Hamming distances between pairs of all 
the 30408 structural sequences on a 6x6 lattice. The 
vertical dashed lines indicate the minimal Hamming 
distances for different Anio. 
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In Fig. H, the logarithmic distributions of the Hamming 
distances between pairs of s' with fixed values of Anio 
are plotted for a 6x6 lattice. The relation between 
dmin{si, S2) and Anio is clearly displayed. Notice that all 
distributions peak at a Hamming distance of 15-20, with 
the width of the distribution decreasing monotonically 
with ArtiQ. 
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FIG. 4. Average number of neighboring structures 
within different Hamming distances Rh for a 6x6 lattice. 
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FIG. 5. Designability distributions for (a) 6x6 square lat- 
tice and (b) 21-site triangular lattice. See the text for detail. 

It has already been shown that the number of s' with 
large nio is much smaller than the number of s' with 
small nio- Hence the former kinds of s' will be even more 



sparsely distributed in V than the latter kinds. Thus 
given an arbitrary s the chances are that most of its 
nearest neighbors will have relatively small nio's. An s 
with large nio will be farther away from its nearest neigh- 
bors than if it has a smaller uiq. This is indeed brought 
out in Fig. ^, where each curve plots as a function of uiq 
the number of neighboring s' in 5 within a Hamming dis- 
tance Rh, averaged over those s' specified by tiiq. It is 
seen that so long as Rh < 15, s' with large nio has far 
fewer nearby neighbors (in S) than s' with smaller nio. 
It follows that s' with large nio will on average have large 
Voronoi polytopes, hence high designabilities. Note that 
the approximate proportional relation between Anio and 
rfmm(si,S2) is uot cxpccted to be limited to square lat- 
tices although the proportional constant is expected to 
be dependent on lattice type. 
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In Fig. H (a) and (b) the logarithmic designability is 
plotted as a function of mo for a 6x6 square lattice and 
a 21-site triangular lattice, respectively. The size of each 
disc indicates the number of s' having the specific nio 
and designability and an open diamond indicates the av- 
erage designability of all s' having the specified nio. On 
the whole the average designability increases with nio up 
to near the maximum nio. For "-lo near the maximum 
value it appears that the heuristic argument given above 
breaks down, probably partly for boundary effects, and 
partly because the number of structures with the largest 
values of nio is very small (3 for nio = 14 and 24 for 
nio = 13 among the 30408 s e 5ona6x6 square lat- 
tice) so that statistical fluctuations become important. 
The designability distributions on several other lattices 
were studied and the pattern shown in Fig. |^ persisted. 
The result is summarized in Table ||, where n^^"^, the 
maximum nio and ^ the nio where the largest av- 
erage designability occurs, are given for each lattice. In 
all the cases n\^^ = n^o"^ — 2 ± 1. Results for three- 
dimensional lattices will be shown elsewhere. 



VI. COMPARISON WITH REAL PROTEINS 

It has been shown that the mathematical contents of 
the mean-field HP model and the LS model are essen- 
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tially identical. The physical (or biological) interpreta- 
tions given to the two models are however entirely dif- 
ferent. The mean- field HP model is based on the as- 
sumption that hydrophobic residues would congregate in 
the core as much as possible. The LS model is based 
on the assumption that large residues would be excluded 
from the core as much as possible. To see which model is 
closer to Nature we compare the results of the two models 
with real proteins by matching model peptide sequences 
against protein sequences culled from data banks. For 
either model, the model sequences are the two sets of 
sequences among a total 26,000,000 randomly sampled 
36- word binary sequences that select the most highly des- 
ignable and least designable structures, respectively, on 
a 6x6 lattice. 

We consider the frequency distributions of the set of se- 
quences {"Pa I A — h,l, S,(j),a, l3,(j)' , a' , (3'}, where the sub- 
script h denotes the concatenated 27006 peptides mapped 
to the 15 most highly designable structures in the mean- 
field HP model; I, the concatenated 24134 peptide se- 
quences mapped to the 1545 least designable structures 
in the mean-field HP model; S, the concatenated 22789 
peptides mapped to the 364 most highly encodable struc- 
tures in the LS model [0; 0, the concatenated protein 
sequences in PDB |11 , converted to a binary sequences 



based on the hydrophobicity of the peptides; a, same as 
(/), but includes only segments of protein sequences that 
fold to a helices; /3, same as 0, but includes only seg- 
ments of protein sequences that fold to /3 sheets; </>', a' 
and f3' , same as cj), a and /3, respectively, except that pro- 
tein sequences are converted to binary ones based on the 
volume of residues. The ten residues designated polar (P) 
are: Lys, Arg, His, Glu, Asp, Gin, Asn, Ser, Thr, Cys 
[ p^ and the ten residues designated as L-type residues 
are, in descending order of volume, Trp, Tyr, Phe, Arg, 
Lys, Leu, He, Met, His and Gin That the HP and 
LS models differ in physical and biological contents is 
predicated by the fact that the two lists overlap poorly. 
This predication will not change if the cut-off points of 
either or both lists are varied slightly. The sequences Vh 
and Vs will be referred to as the most foldable peptides 
in the HP and LS models, respectively. 

To compare the sequences, we employ a Cartesian co- 
ordinate representation for symbolic sequences , here 
applied to binary sequences. Let S denote the set of 2' 
binary strings a of length /. Given a binary sequence Vx 
of length L and a string length / (we are interested only 
in cases when L >> I), there is the set {fx\cr)\cr G S} 
of frequencies of occurrence of the string a in A. The fre- 
quencies may be obtained, say, by counting while sliding 
a window I digits wide along A. The frequency depends 
on the ratio of to 1 digits in the sequence. This ra- 
tio, rx, is 0.983, 1.039, 0.553, 0.960, 0.993, 0.720, 0.734, 
0.917 and 0.934, respectively, for the sequences Vx, A= 
h, I, S, 4>, a, (3, (j)' , a', /?'. In order to make a fair compar- 



ison of the sequences adjustments need to be made to 
compensate for the disparity in the to 1 ratios. For 
this purpose we define a normalized frequency /' by 



/'i'^M = (^A)"'/r(<^) 



(0/ 



(11) 



where ria- is the number of O's in a. Sequences in the 
normalized frequency set {/'^'^(cr)} now have to 1 ratios 
equal to unity. 

In what follows we consider only cases when I is even, 
I = 2k. Let £ be a 2*^ X 2^^ lattice with spacing 2"^^, and 
TT be a one-to-one mapping from S to £., tt : S ^ C by: 



7r(cr) = {x,y) 



,crk+i ■ 



2-\ ^a,-2 

i=l 



, n-(A:-4+l) 



(12) 



where a — [ai, 02, • • • , <^ik\ is a string in S and (a;, y) is a 

/(O 

A 



site on L. From the set {/'^'"'(c)} we define a normalized 



relative frequency distribution of A on the lattice C: 
Fl\x,y) ^ Fl!\nia)) = (ff{a) ~ /f ) /Z, (13) 



(14) 



where /|'^ is the mean frequency and 



Figs. 1^ and |^ show the distributions ^ \— 0, a, 
/? and /i, and A= (/>', a', /3', and 5, respectively. In the 
figures, the magnitude of the distribution is coded into 
the gray scale shown at the top of the figures. From 
the fact that (b) and (d) in Fig. ^ have their brightest 
and darkest regions, respectively, at generally the same 
locations, it is evident that Vh ((d)), the most foldable 
peptides in the HP-model, is closest to Va ((b)), the 
sequence that represents a helix segments in real protein 
sequences. In comparison, although (a) looks similar to 
(b), it is not so similar to (d). In particular, some of the 
brightest regions in (a) are dark in (d), and vice versa. 
In sharp contrast (c), which represents /3 sheet segments 
in real protein sequences, is entirely different from all the 
other distributions in Fig. |^. 

Turning to Fig. |^, it is noticed that (d), represent- 
ing the most foldable peptides in the LS model, is very 
similar to its counterpart in the HP model, Fig. ^ (d). 
This is as expected because the mathematical contents 
of the two models are essentially identical. On the other 
hand, (d) is very dissimilar to (a), which represents all 
protein sequences in PDB, but with the residues parti- 
tioned according to the LS model. This shows that size 
of the residue is not the most dominant factor in protein 
structure. 

The frequency distributions shown in Figs. ^ and ^ are 
repeated in Figs. I| and 0, except that the word length I is 
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now eight instead of six. This imphes that the sequences 
V\ are now examined with a finer resolution. The result 
is similar to the I = 6 case: the most foldable peptides in 
the HP model closely resemble the a helix segments of 
real protein, while the foldable peptides in the LS model 
do not resemble real proteins. 
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FIG. 6. Frequency distributions of strings of 
length 6 in the sequences (a) Vqy, (b) Va, (c) 
Vp, and (d) Vh', see text for description. 
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FIG. 7. Frequency distributions of strings of 
length 6 in the sequences (a) T'^, (b) V'^, (c) 
V'fj, and (d) Vs', see text for description. 

The sequences V\ may be compared in a more quanti- 
tative manner through the overlap of frequency distribu- 
tions: 



(15) 



The overlaps O^^, for a number of pairs (A, A') selected 
from the set {h, Z, 5, 0, a, /3, 0', a', /3'}, and for I = A ^ lA 
are given in Fig. pOl. 



One first notices that, with the exception of o'^^g (■ in 
Fig. |l^) , all the overlaps approach zero as the word length 
I increases. This is so because the resolving power of the 
method increases with I] for sufficiently large I, the reso- 
lution becomes so large that any two sequence that does 
not have substantial and extended sequence identity will 
have zero overlap. That ofg has large positive correla- 
tion throughout the whole range of I studied is expected 
from the mathematical equivalence of the HP and LS 
models. In Ref. jl^, the parameter a in Eq.(|^) was taken 
to be infinity to emphasize the steric constraint on the 
residues. Here we had done the same just to conform to 
Ref. On the other hand, since in the present study 
all the structures are self-avoiding paths on a discrete lat- 
tice, the steric constraint caused by the existence of the 
backbone is automatically satisfied. Therefore, so far as 
the intention of the LS model is concerned, a small and 
positive, but not infinite, value for a would have sufficed. 
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FIG. 8. Frequency distributions of strings of length 
in the sequences (a) V^, (b) Va, (c) Vp, and (d) Vh- 
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FIG. 9. Frequency distributions of strings of length 
8 in the sequences (a) V'^, (b) V'a, (c) V'/j, and (d) Vs- 
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The overlap (filled A) is larger than most other 
overlaps for much of Vs shown in the figure. This is con- 
nected to a basic fact of proteins: a helices account for 
almost half of the total amount of protein sequences in 
PDB. The overlap drops sharply when l>12 because most 
a helix segments are shorter than 15 residues long. 

Next in order of magnitude are the overlaps O^'^ and 
ol^l (filled V and •); these have large positive values 
for the smaller Vs. This reveals that the mean-field HP 
model provides a coarse-grained description of some fea- 
tures of the real proteins and suggests that the basic as- 
sumption of the model - that local residue- water interac- 
tion is the dominant cause for protein folding - is consis- 
tent with the mechanism for the formation of a helices. 
The overlaps decrease with increasing I for the general 
reason given above. On the other hand, the negative 
correlation shown by the negative value of the overlap 
o'pl^ (v) shows that the same assumption is inconsistent 
with what causes the formation of /3 sheets. Two of the 
obvious reasons are: whereas most f3 sheets are buried in 
the interior of proteins, the mean-field HP model differ- 
entiates only surface from core sites but has no means of 
influencing the interior structure of proteins; the stability 
of most (3 sheets depends on long-range interactions that 
are absent in the model. 

1 -0 r , , , , , , ] 




2 4 6 8 10 12 14 16 

1 

FIG. 10. Overlap of frequency distribution func- 
tions versus word length I: O^^l^ (filled A), O^'^ (filled 

V), olll (•), (■), oiV^ (A), o« (V), o<Vs (O), 

O^Vg (d) and Ol^l (O)- See text for the description 
of the subscripts h, I, S, (j), a, (3, 0', a' and /3'. 

The negative value of the overlaps between Vs and 
V^'^a'.p' (n, A and O, respectively) indicates that the 
highly foldable peptide sequences in the LS model are 
anti-correlated with the real protein sequences for / < 
6 and uncorrelated for larger /. This confirms what is 
already seen in Figs. ^ and |^: that size effect is not the 
dominant factor determining the formation of a stable 
protein conformation. Finally, the large negative values 



of the overlap Oj^/ (O) for all values of I tested simply 
verify that the most and least foldable peptides in the HP 
model are highly dissimilar however they are compared. 

VII. DISCUSSION 

Because conformation designability in protein struc- 
ture refers to the natural selection of a very small num- 
ber of topological classes of native conformations over the 
vast total number of classes, it is a topic that can be suit- 
ably studies in coarse-grained settings such as in lattice 
models. Previous lattice model studies have firmly estab- 
lished that indeed only a very small number of (model) 
structures, out of a very large total number, are highly 
designable. It has not been shown why this phenomenon 
should arise, and to what classes of native conformations 
would the highly designable structures correspond. In 
this paper, taking advantage of the geometric picture for 
the designability problem given in [Q, namely that des- 
ignability of a structure in the mean-field HP model is 
proportional to Voronoi volume of that structure in a 
certain hyperspace, we showed that uneven designabil- 
ity arises because a type of structures - those with the 
largest numbers of surface-core switchbacks - are very 
rare, and that their nearest neighbors in the hyperspace 
are other similar rare structures. Hence such structures 
have the largest Voronoi volumes and the highest des- 
ignabilities. Because the hyperspace of structures has 
properties independent of the two-dimensional lattices 
used in the present study, this conclusion is expected to 
stand for other more realistic lattices. Indeed, the same 
effect was observed on a three-dimensional lattice based 
on an icosahedron 

The identification of structures having the largest num- 
bers of surface-core switchbacks with the conformation 
classes of observed proteins entails certain physical and 
biological implications. Proteins choosing such struc- 
tures as native conformations would tend to have ratios 
of numbers of H-type and P-type residues close to being 
unity. Indeed, the averages of H to P ratios for all the 
protein sequences in PDB, for the segments that folds to 
a helices and and for those that fold to (3 sheets, respec- 
tively, are all very close to unity. Proteins having struc- 
tures with many surface-core switchbacks are expected 
to be energetically favored. For such proteins would by 
and large have alternating P and H residues that match 
the pattern of the structures, and the outward-pointing 
force exerting on the P-type residues and the inward- 
pointing force exerting on the H-type residues together 
would make the protein especially sturdy. 

On the mean-field HP lattice, high-designability struc- 
tures tend not to have long sequences of contiguous sites 
that are purely core sites or purely surface sites (see Ta- 
ble m in Appendix), because such structures tend to be 
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involved in degenerate cases - peptides with correspond- 
ing contiguous subsequences of P- or H-type residues (or 
S- or L-type residues in the LS model) would easily have 
two or more such structures as ground states - and for 
that reason the peptide and the degenerate structures 
would have been excluded from the set of allowed pep- 
tides and acceptable structures, respectively. This prac- 
tice is justified biologically: peptides and conformations 
involved in degeneracy (in a coarse-grained sense) are 
presumably filtered out by evolution because they would 
make for functionally unreliable proteins. In fact, rel- 
atively few proteins in PDB have sequences containing 
long segments of contiguous P- or H-type residues whose 
native conformations have long segments of contiguous 
surface or buried sites (23|. Such native conformations 
are presumably generated by the finer details of inter- 
residual interactions, and the conformation classes to 
which they belong would not have counterparts among 
the high dcsignability structures given by simple, coarse- 
grained lattice models. 

Because structures on square lattices are not realistic 
enough for direct comparison with empirically observed 
topological conformation classes, we compared model 
peptides folding into such structures, namely the most 
foldable peptides, with (binarized) peptide sequences in 
the PDB. If the highly designable structures are rich in 
surface-core switchbacks then the highly foldable pep- 
tides should be rich in H and P singlets and HH and PP 



doublets. In Table [II in the Appendix it is seen that the 



the highly foldable peptides in the mean-field HP model 
are rich in HHPP (or PPHH) but poor in HP (or PH) re- 
peats. This reflects an artifact of the square lattice. On 
such lattices, the shortest surface-core switchback motif 
is surface-surface-core-core (or core-core-surface-surface) 
repeats while surface-core repeats do not exist (see first 
two "constraints" in Section |^). We showed that the 
most foldable peptides match well with those segments 
of protein sequences in PDB that fold into a helices but 
match relatively poorly with segments that fold into /3 
sheets, a helices are most commonly amphipathic and lie 
on the outside of their host proteins. With 3.6 residues 
per turn, such a helices tend to change from H to P 
residues with a periodicity of three to four. That is, they 
should have a predominance of alternating HH and PP 
doublets interspersed with H and P singlets. Indeed, of 
all peptide sequences that code a helices in the PDB, 24% 
of H to P (or P to H) changes are after singlets, 36% are 
after doublets and 22% are after triplets. This implies 
that a helices are relatively rich in HHPP repeats and 
this could explain why the most foldable model peptides 
(in the mean-field HP model) match well with a helices. 

The situation is different with respect to /3 sheets. The 
most common domain structures in proteins are a//3 do- 
mains that consist of a central group of /3 sheets sur- 
rounded by a helices. The /3 sheets in these domains will 
not be rich in either HHPP or HP repeats. In the sec- 



ond large group of protein domain structures, comprised 
of antiparallel /3 sheets, some of the sheets are on the 
outside of the protein and these are rich in HP repeats 
but not in HHPP repeats. A superfamily of proteins 
containing such /3 sheets has members such as the hu- 
man plasma retinal-binding protein and /3-lactoglobulin, 
a protein that is abundant in milk. Of all peptide se- 
quences that code /3 sheets in the PDB, 33% of H to P 
(or P to H) changes are after singlets, 28% are after dou- 
blets and 18% are after triplets. Hence the most foldable 
model peptides would match poorly with /3 sheets. 

If our computation were carried out on a lattice that 
allowed structures with surface-core repeats then the 
foldable model peptides would have better matched se- 
quences coding for /3 sheets. Still, because the only in- 
teraction taken into account in the mean-field HP model 
is the hydrophobicity of the residues, whereas the forma- 
tion of the majority of /3 sheets depend on other details 
of interesidual interactions, we cannot expect the most 
foldable model peptides to have a good match with the 
majority of /3 sheets irrespective of what lattice was used. 

If hydrophobicity but not inter-residual interaction is 
indeed the main force that drives the formation of a 
helices, then we can better understand why a helices 
are formed on a time scale of the order lO^^s p^ , p5[ , 
right after the collapse of the protein to globular shape, 
and why it takes ten times longer for the formation of /3 
sheets, which involves interactions between residues dis- 
tantly separated on the primary structure. This scenario 
is consistent with the finding in a recent statistical analy- 
sis of experimental data: local contacts play the key role 
in fast processes during folding p^ . 

We have shown that the mathematical content of the 
LS model, which partitions residues into large (L) and 
small (S) ones, was essentially the same as that of the 
mean-field HP model. Hence the binary composition of 
the most foldable peptides in the two models are quite 
similar (see Table |l|, Appendix) . However, because not 
all large (small) residues are hydrophilic (hydrophobic), 
the most foldable peptides in the two models are mapped 
to significantly different sets of (binarized) protein se- 
quences. The result is that the most foldable peptides 
in the LS model do not match well with any subset of 
proteins in the PDB. This means that steric hindrance 
effect arising from different sizes of the residues is not the 
main driving force for protein folding. 
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APPENDIX 

Here we show how the two lattice models differ by 
comparing strings of several lengths that have the high- 
est and lowest frequencies of occurrence, called the most 
and least favored strings, respectively, in the sequences 
Vh and Vs, which are the concatenated sequences of the 
mostly highly foldable peptides in the mean-field HP and 
LS models, respectively. In Table [H, the first and sixth 
columns list such strings. Strings of different lengths are 
ranked separately by their normalized relative frequency 
of occurrence (Eq. (pH)); the string with the highest (low- 
est) frequency is ranked 1 (2'). By definition, an unfa- 
vored string has negative frequency. Table HI shows that 
the most favored strings are quite well correlated in the 
two models but the least favored strings are not so. It is 
seen that among 4-mers the repeats (0011) are the most 
favored pattern in both models, long repeats of I's and O's 
are the least favored string patterns in the HP model fa- 
vored string patterns in the HP model and (01) is the the 
least favored string repeat in the LS model. The reason 
for this is clear: (0011) repeats are the favored pattern 
in most highly designable structures in both models and 
each of the (peptide) strings (0000), (1111) and (0101) is 
separated from (0011) by the greatest frame independent 
Hamming distance. There is an additional disincentive 
for a peptide to have (01) repeats in the LS model. On 
a square lattice such repeats do not appear in a struc- 
ture sequence, hence, with L-type residues (represented 
by digits) strictly forbidden on core sites (represented 
by 1 digits), a peptide string with 01 repeats can only 
occupy a structure sequence composed entirely of sur- 
face sites. This gives the peptide zero binding energy 
in the LS model. The situation is different in the HP 
model. There a peptide string with 01 repeats can oc- 
cupy a structure sequence with 0011 repeats and non-zero 
binding energy. 
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TABLE III. Strings most and least favored in the mean-field HP and LS models. Strings of different lengths are 
ranked separately; e.g., the least favored string of length 4 is ranked 2''=16. 



Strings most/least 


HP model 


LS rn 


ocicl 


Strings most/least 


LS model 


HP model 1 


favored in HP model 


freq. 


rank 


freq. 


rank 


favored in LS model 


freq. 


rank 


freq. 


rank 






1 

J. 




1 n 






1 

J. 


n 4979 


9 
z 




0.4272 


2 




1 


(^ 1 00^ 


0.3693 


2 


0.4224 


3 


(0000) 


-0.3883 


15 


0.2732 


3 


(1010) 


-0.3815 


15 


-0.1572 


11 


(1111) 


-0.3903 


16 


0.0109 


9 


(0101) 


-0.3892 


16 


-0.1594 


12 


(Tim 1 nn^ 


4fin5 


1 


n 9fiQ4 


1 


('001 1 00^ 


9fiQ4 


1 


4f!05 


\ 


(m 1 001 

l^U J- J-UU -L J 


n 974R 


9 




90 

Z\J 


("00001 1 

l^UUUU-L -L J 


n 9RQ4 


9 
z 


n n^i 

\}.\jO l-O 


1 8 


n 001 1 C\\ 


n 9RQ8 


o 


u.uu / z 


1 Q 


(■] 1 0000^ 

^^ -L -LUUUU J 




T 
O 


u.uouy 


9'^ 
zo 


(Tinnnni ^ 


-fl 1 795 


fi9 




22 


n 01 01 0^ 

^^ -LU-LU-LU J 


918fi 


fi9 

\JZ 


-0 1 95"^ 


58 

OQ) 


n nnnnn^ 

-LUUUUU J 


n 1 741 


uo 


n n^s5 


21 


(m 01 01 ^ 

^^U-LU-LU-L J 


n 9999 
-\J .zzzz 


uo 


n 1 9^4 


^7 

O 1 


^UUUUUU J 


-0.2694 


64 


n 0974 
Kj.yjz t ^ 


95 
zo 


('001 01 0^1 


-0.2224 


64 


-0.0589 


39 




0.2101 


\ 


U.IUIO 


1 Q 


1 1 nnnni i ^ 


9*^1 8 

U.ZiO J-O 




U. J-O 1 o 


4 


(m 1 001 1 




9 




51 


['00001 1 00"! 


91 41 


9 
z 


n 1 ^^9 


1 

LO 


n 1 001 1 C\C\\ 


1Q77 


•x 
o 


n 1 noi 


90 

Z\J 


('noi 1 0000"! 


0.2110 


Q 
O 


1 1 Q1 

u. J- J- y -L 


9*^ 
zo 


n 1 00001 1 ~1 




4 


0.2318 


I 


/'noi 1 1 1 oo~^ 




A 




9nn 


f 000000 1 1 ] 


-\).y),}— i 


9^'! 


()9();; 




((]](]](]] (](]] 


ll'IX'l 


Z' )r > 


11 111 1 


1 s;fl 

i OU 


('OOOOOOOl ~t 


101^ 


254 


U.UOU-L 


79 


on 01 001 0^ 


1 OOR 


254 


04 1 


1 88 

-LOO 


n 0000000") 


-yj . ±\jZiO 


955 
zoo 


0.0334 


63 


foiooioio^i 


101 


955 
zoo 


04 "^fi 


194 




-0.1060 


256 


n nnss 


Q4 
y^ 




-0.1017 


256 


-0.0379 


172 


("00110011001 


0.1682 


1 


0.902 


14 


fOOllOOOOlll 


0.1837 


1 


0.1400 


4 


(1100001100) 


0.1574 


2 


0.1830 


2 


(1100001100) 


0.1830 


2 


0.1574 


2 


(0110000110) 


0.1548 


3 


0.1335 


3 


(0110000110) 


0.1335 


3 


0.1548 


3 


(0011000011) 


0.1400 


4 


0.1837 


1 


(1001100001) 


0.1230 


4 


0.1211 


8 


(1111000000) 


-0.0408 


1021 


0.0220 


214 


(0101001010) 


-0.0441 


1021 


-0.0173 


693 


(1110000000) 


-0.0414 


1022 


0.0508 


58 


(0100001010) 


-0.440 


1022 


-0.0102 


528 


(0000000000) 


-0.0426 


1023 


-0.0219 


773 


(0101010101) 


-0.0444 


1023 


0.0268 


893 


(1111111111) 


-0.0427 


1024 


-0.0358 


914 


(1010101010) 


-0.0446 


1024 


0.0250 


869 
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